The dataset I chose to analyze was the Prosper Loan Data. My financial history lends me a particular interest in learning more about the story of loans through the data.
Bivariate Plots


Credit Scores (“ScoreX”):
Table continues below
| 3 |
1 |
5 |
36 |
141 |
346 |
554 |
1593 |
1474 |
1357 |
Table continues below
| 1125 |
3600 |
4168 |
8655 |
10213 |
10534 |
9998 |
9083 |
7378 |
5582 |
| 4045 |
2338 |
1300 |
545 |
211 |
27 |
Credit Scores (“FICO08”):
| 3530 |
6135 |
5950 |
5459 |
3815 |
1871 |
1011 |
575 |
303 |
106 |
18 |

Interestingly, the number of loans given out by Credit Grade follows a somewhat normal distribution, with “C” or “D” graded borrowers receiving the highest number of loans, depending on the rating configuration of the site. I am curious whether this is by design from Prosper to group a broader subset of applicants into this category in attempt to really highlight those on the more extreme ends of the scale or if it simply reflects the type of applicants the site is getting. Presumably, better qualified borrowers have more and potentially better options than what Propser offers. Conversely, less qualified borrowers are likely not getting approved for as many loans on the site as their slightly better qualified counterparts.
When separating by Credit Score System and Prosper Rating System, the distributions become less normal, but the middle rated borrowers still account for the largest numbers of issued loans from the website. We do see that the percentage of loans going to “D”, “E”, and “HR” relative to the other rated loans seems to drop after the switch to FICO08. This could be due to a stricter rating process on Prosper’s end or due to Investors being more discriminating with the improved information.
This led me to wonder if it would be possible to determine if changes in Investor behavior were caused by this change. This will be investigated further with a multivariate plot.

Percentage of Loans with one investor: 24.47
Correlation Coefficient of Investors and Prosper Credit Grade: -0.3476
As we saw in the previous section, a large number of loans have only one investor. We can see this influence when plotting Investors by Credit Grade. The first quartile of A, B and C rated Borrowers is at or near one.


New Debt To Income Ratio Stats:
| 0.0001859 |
0.1875 |
0.2737 |
0.3115 |
0.3843 |
13.95 |
9795 |
Naturally, those applying for Debt Consolidation loans had a higher debt to income ratio prior to acceptance of the new loan. However, while borrowers of these loans likely remained at a similar debt to income level after securing the consolidation loan and presumably paying off other debt, non consolidation loan borrowers saw an increase in debt to income ratio, which was also to be expected.

Correlation Coefficient of BorrowerRate and LoanOriginalAmount: -0.3296
Correlation Coefficient of BorrowerRate and Prosper Credit Grade: 0.8792
BorrowerRate and LoanOriginalAmount are not particularly correlated, however BorrowerRate and Prosper Credit Grade seem to be highly correlated, as the statistic, separation in the plot and website would suggest. Yet, this does not tell the whole story. Previous iterations of this plot suggested that BorrowerRate went down as LoanOriginalAmount went up. Separating by Credit Grade reveals that interest rates are generally flat within a Credit Grade group, with the exception of the lower rated borrowers, who have greater variability. We see that BorrowerRate gets lower as Loan Amount goes up for the worst rated group. I suspect that this is due to the fact that the only “HR” borrowers eligible for higher loan amounts have something in their credit profile that affords them more eligibility than the average “HR” borrower. For example, these higher loan amount “HR” borrowers may have better credit scores than the average “HR”, giving them a lower interest rate, but have a large number of delinquencies in the past 7 years, which relegates them to the “HR” category.

Correlation Coefficient of BorrowerRate and Term: 0.0201

Correlation Coefficient of BorrowerRate and DebtToIncomeRatio: 0.06296
Clearly, the Prosper Credit Rating has a much bigger affect on the loan’s interest rate than the original loan amount, term or the borrower’s debt to income ratio.

Correlation Coefficient of BorrowerRate and avgCredit: -0.4871

It looks like Credit Score has a similar effect on loan interest rate to Prosper Credit Grade. One would assume that the two must be closely related.

Correlation Coefficient of ProsperCreditGrade and avgCredit with the old rating system and ScoreX: -0.9788
Correlation Coefficient of ProsperCreditGrade and avgCredit with the new rating system and ScoreX: -0.5823
Correlation Coefficient of ProsperCreditGrade and avgCredit with the new rating system and FICO08: -0.5927
From the plots and the correlations above, I believe the original Prosper Credit Rating may have been a simple aggregation of credit scores into easier to read letter grades. After the new rating formula was implemented, credit score was clearly heavily involved, but no longer the sole factor in assigning a Credit Rating.
Looking at this from another angle, I wonder what the mean credit score is for a given interest rate.

Consistent with the previous findings, those with higher credit scores qualify for better interest rates. This is by design and controlled by the website. It was interesting however to see an uptick in mean and median credit scores at the very high end of interest rates. I wonder if this is due to the fact that the higher interest rates are associated with either much larger original loan amounts or much longer terms that only better qualified borrowers would be approved for.
Term Summary of Loans with Borrower Rate Greater Than or Equal to 0.29:
Term Summary of Loans with Borrower Rate Less Than 0.29:
Original Loan Amount Summary of Loans with Borrower Rate Greater Than or Equal to 0.29:
| 1000 |
2500 |
4000 |
3910 |
4000 |
25000 |
Original Loan Amount Summary of Loans with Borrower Rate Less Than 0.29:
| 1000 |
4000 |
7500 |
9091 |
13500 |
35000 |
Number of Loans with Borrower Rate between 0.275 and 0.3: 7452
Number of Loans with Borrower Rate Between 0.3 and 0.325: 9257
I was quite wrong in my estimate for the reason behind why the average credit score is higher for those borrowers of higher interest rate loans. I even surmised that the lack of relative volume of loans above 30% could be skewing the numbers, but there were even more loans with rates from 30-32.5% than from 27.5-30% interest.

Correlation Coefficient of Investors and BorrowerRate: -0.274
Correlation Coefficient of Investors and BorrowerRate on loans with more than 1 investor: -0.4176
As we have seen previously, a large number of loans with only one investor may be effecting the data. Here, the correlation between Investors and Borrower rate went up when excluding the nearly 25% of loans with only one investor. I am curious if this is true throughout the period of time covered by the dataset.


An explosion of One-Investor loans occurs in early 2013 and I wonder why. To see it reach levels of over 4000 loans per month when the previous high was just over 200, is shocking to say the least. I want to see if we can investigate investor behavior further by looking at investor relationships with other variables.

Correlation Coefficient of Investors and avgCredit: 0.2831
While the plot suggests some relationship, Investors per loan is not very highly correlated with credit score. My guess on why the plot looks to have a stronger relationship is due to the high number of one investor loans in the 650-750 credit score range.

Correlation Coefficient of Investors and DebtToIncomeRatio: 0.004068

Correlation Coefficient of Investors and LoanOriginalAmount: 0.3803
Investors do not seem to factor Debt to Income ratio into investment decisions at all, with a correlation that low. It does however seem that higher loan amounts are likely to attract a higher number of investors. I suspect that this is due to investors wanting to spread the risk of default.
The following plots will be faceted for the purpose of separating the data only, rather than for comparing the data among the faceted variable.

Correlation Coefficient of avgCredit and TotalCreditLinespast7years: 0.1018

Correlation Coefficient of avgCredit and InquiriesLast6Months: -0.271
It would seem that credit inquiries in the last 6 months or total credit lines in the last 7 years are not significant factors in determining credit score.
Delinquencies Summary:

Number of borrowers with zero delinquencies in the last 7 years: 76281
Number of borrowers with at least one delinquency in the last 7 years: 36465
Percentage of borrowers with Zero or One delinquency in the last seven years: 70.6 %
It is interesting to see that those borrowers with only zero or one delinquency in 7 years (which make up the majority of the dataset) have a measurably higher average credit score than those with more delinquencies. Borrowers with 3-50 delinquencies have approximately the same median credit score and borrowers with approximately 50-80 delinquencies in the last seven years generally have the same median credit score. This suggests that credit rating agencies group people when factoring this statistic into credit scores.


Correlation Coefficient of avgCredit and DelinquenciesLast7Years with “ScoreX”: -0.2596
Correlation Coefficient of avgCredit and DelinquenciesLast7Years with “FICO08”: -0.3072
While there appears to be a downward trend in average credit score as delinquencies go up, the two variables are not highly correlated enough to assume causation. The biggest difference in these numbers is between those with zero or one delinquency and all others.

OpenCreditLines Summary:
The most notable feature of the plot above is that those borrowers with very few open lines of credit actually have lower average credit scores, suggesting that some debt is actually good. I wonder if each credit scoring system treats open credit lines the same.

Interestingly, the two systems differ slightly when it comes to open credit lines. While ScoreX slightly rewards those with more open credit lines than fewer, FICO08 shows slightly lower average credit scores for those borrowers with 3-5 open credit lines as compared to both those with none and 6+ alike.

OpenRevolvingAccounts Summary:

OpenRevolvingAccounts effect on credit score is very similar to that of OpenCreditLines, which makes sense as revolving accounts are likely included in all credit lines. Oddly though, we don’t see a similar effect on credit score with revolving accounts under FICO08 as we did with all credit lines. It appears that under this calculation, the number of open revolving accounts is not factored into the credit score much if at all.

Correlation Coefficient of OpenRevolvingMonthlyPayment and RevolvingCreditBalance: 0.761
As expected, OpenRevolvingMonthlyPayment and RevolvingCreditBalance are highly correlated, with much of the variance having to do with interest rate and how the debt is spread among open accounts.

Correlation Coefficient of BankcardUtilization and AvailableBankcardCredit: -0.3507
Correlation Coefficient of OpenRevolvingMonthlyPayment and BankcardUtilization: 0.2979

Having some available credit is better than none, but there are diminishing returns after $10,000-$20,000 of available credit.

When comparing many of the credit metrics to credit score, it would appear once again that some debt is good. The global peak (or at least local peak) credit score is often around the first quartile of many of the metrics, such as OpenRevolvingMonthlyPayment, OpenRevolvingMonthlyPayment, OpenRevolvingAccounts, Bankcardutilization and DebtToIncomeRatio. While the faceting above was simply intended to account for the fact that each system derived scores slightly differently, some really interesting differences were shown. Each model seems to factor Debt To Income Ratio and Open Credit Accounts very differently.

Correlation Coefficient of AvailableBankcardCredit and BankcardUtilization: -0.3507
Sadly, given that other variables have an effect on the plot above, I struggled to find anything meaningful or interesting.

Correlation Coefficient of AvailableBankcardCredit and BankcardUtilization: -0.02973
While I would have expected those with more delinquencies to have higher Bankcard Utilization (both indicators of poor credit management), it seems that the two variables are not related.

I expected homeowners to have more delinquencies as they likely have a larger debt burden than most non-homeowners, but they indeed have less. This is a more nuanced discussion however. Are those with higher delinquencies less likely to own a home due to poorer credit management? Does owning a home condition a person to pay bills on time? This definitely requires more investigation.
Correlation Coefficient of avgCredit and DelinquenciesLast7Years: -0.2627
Delinquencies and average credit score are not highly correlated and after plotting delinquencies by Prosper Credit Grade, I see why. Because of the number of borrowers with zero delinquencies in the last 7 years, the median number of delinquencies across all grades is zero, which skews any potential correlation. The clear difference we see is in the third quartiles of the poorer credit grades.

While delinquencies are not highly correlated with credit score, it does seem to be something that is factored into human decisions about credit viability. There is a clear decline in average number of open non-revolving lines of credit as delinquencies increase. As non-revolving lines of credit are typically approved by an underwriter who would look at more factors than just a credit score, it is reasonable to infer that this decline we see is deliberate.

Similarly, we see that less Investors are likely to invest in a single loan if the borrower has a higher number of delinquencies.
Bivariate Analysis
Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?
The analysis revealed two major shifts in the dataset that had an effect on a number of the variables throughout. First, in 2009 Prosper changed its proprietary, letter-based credit rating system. Under it’s old system, each letter grade was almost equally likely to have been assigned, with the most frequent grades being “B”, “C” and “D”. Under the new system, the distribution is more normal and while “C” is still the most frequently assigned grade, the highest and lowest grades “AA” and “HR” were much assigned much less frequently than all other grades.
Second, in 2013 Propser migrated it’s source of credit scores from Experian ScoreX to the more widely used FICO08. The credit scores in each source are calculated in very different ways, so it is almost impossible to conduct any analysis on scores among the entire dataset, without separating by which source was used to create the score. The dataset placed borrowers into groups by credit score and while the groups in each source were generally 20 points apart, the old source gave scores all the way from 0 to 899 but the new source gave scores from 640 to 859. It is highly unlikely that prosper simply decided to limit borrowers by that credit score range after the switch, so we can assume that it is just a fundamentally different rating system.
The original Prosper grading system appears to have relied heavily on credit score. Under the new rating system, borrowers with higher credit scores generally have higher Prosper grades, but the system seems to be much more nuanced, factoring other variables and not just credit score. Regardless of rating system, borrowers with higher grades garner more investors per loan, on average.
A loan’s amount and term do not appear to have an appreciable effect on the loan’s interest rate. Debt-to-income ratio seems to be factored in slightly, but Credit Score and Prosper Credit Grade are the highest correlated with interest rate. While those with lower credit scores generally have higher interest rate loans, the average credit score among those with the highest interest rate loans (>30%) is actually slightly higher than those with lower interest rate loans around 30%.
Credit scores are likely calculated using many of the “credit history” and “credit health” variables that exist in this dataset, but outside of OpenRevolvingMonthlyPayment, I was unable to find many strong correlations. However, I did notice an interesting phenomena when comparing many of the variables to credit score, where a local maximum would present itself around the first quartile of the variable. These variables are all indicators of current debt or credit utilization, suggesting that carrying a small amount of debt may actually lead to slightly higher credit scores than would be expected.
Finally, while I faceted some of the plots above by Credit Score Source in order to more accurately analyze a particular variable’s effect on the score, some interesting differences in the models was shown. For instance, each model seems to factor Debt To Income Ratio and Open Credit Accounts very differently
Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?
I was fascinated that nearly 1/3 of borrowers in the dataset had at least one delinquency in the last 7 years. Among those with at least one, the average number of open non-revolving credit lines declines as the number of delinquencies in the last 7 years goes up. While this may be due to better credit management, my guess is that those with a greater number of delinquencies are less likely to be approved for a non-revolving line of credit. This theory can be defended by looking at the number of investors per loan compared to number of delinquencies. Those borrowers with less than five delinquencies are much more likely to receive a higher number of investors than those with more than five.
On the topic of investors, it was also interesting to see the median investors per loan over time. There appears to have been a huge shift in early 2013 driving down median investors per loan to as low as 1 in 2014. This is confirmed when we look at the number of “One Investor Loans” per month and see massive increases in 2013 and 2013.
What was the strongest relationship you found?
By far, the strongest relationship I found was that of Credit Scores and Prosper Credit Grade with the old rating system. With a correlation coefficient of -0.98, it is very likely that the rating system was simply an aggregation of a group borrowers based on credit score alone. In fact, there are only two instances where a credit grade’s minimum credit score matches that of the lower grade’s maximum.
Multivariate Plots

Looking at this plot alone, it is easy to be confused as to why the distributions are so different between the two credit scoring systems. What caused investor behavior to change so much after the change to FICO08?
Investor Summary for “ScoreX” Loans:
Investor Summary for “FICO08” Loans:
Investors per Day in the “ScoreX” Era: 2879
Investors per Day in the “FICO08” Era: 4972
As we have seen previously, there was a massive explosion of one investor loans in early to mid-2013. The switch to FICO08 occurred in late 2013, so it is unlikely to be the cause of the major shift in investor behavior. Further skewing the plot is the fact that the site saw much greater activity in the FICO08 era than before the change. It is no wonder we see such a disparity in the median and mean investors per loan during this time.

Consistently, we see that nearly every median investors per loan by credit score is one. Given the differences we’ve seen between the median and mean, I am curious to see what the distribution of means by credit score would look like.

We do again see that mean investors per loan is down in the FICO08 era as compared with the ScoreX era. However, it is good to see a definitive increase in mean investors per loan as credit scores go up. This is to be expected and likely due to higher investor confidence.
Percentage of borrowers with a credit score above 825: 1.942 %
With such a low percentage of borrowers with a credit score above 825, it is likely that the decline we see in the plot above is either related to some anomaly in the data or a regression to a more appropriate level of investors per loan.
Loans Per Term:

Faceting the loans by term, we see that loan amounts from $3500-6000 command the highest interest rates in their group, with slightly higher and slightly lower loan amounts commanding lower rates. Variability in interest rates per original loan amount seems to vary by term as well, even though some of the median rates in each original loan group do not seem to follow a pattern term to term.

We would expect to see a good deal of linearity when comparing Loan Amount to Monthly Payment. Obviously the majority of the variance in this case is due to the interest rate. The 36 month loans do seem to have a higher variance than the other loans, so it will be interesting to see if this holds true.
12 Month Loan APR Variance: 0.008147
12 Month Loan APR Summary:
| 0.04935 |
0.1466 |
0.2203 |
0.2162 |
0.2917 |
0.3584 |
36 Month Loan APR Variance: 0.007312
36 Month Loan APR Summary:
| 0.00653 |
0.1486 |
0.2097 |
0.2195 |
0.2926 |
0.5123 |
60 Month Loan APR Variance: 0.003304
60 Month Loan APR Summary:
| 0.07111 |
0.1718 |
0.2093 |
0.2168 |
0.2572 |
0.3584 |
Surprisingly, 12 month loans had greater variability than 36 or 60 month loans.

Given that we know how correlated interest rate and Prosper Credit Grade are, it is not surprising to see how similar these plots appear. This plot however gives a clearer view of the separation between the groups of borrowers on the 36 and 60 month loans.
Among Debt Consolidation Borrowers, What is the Relationship Between Open Credit Lines and Debt to Income Ratio? 
I would expect that as Open Credit Lines increases, so too will Debt to Income Ratio. However, even with jittering and changing the alpha level of the points, I don’t feel that this plot gives us an idea of the relationship between the two. My sense is that a boxplot would do a better job of this.

With this plot we can see that non-homeowners generally have higher debt to income ratio per open credit line than homeowners.
Correlation Coefficient of MonthlyLoanPayment and BorrowerAPR: -0.2274

Given how integral a loan’s interest rate is to it’s monthly payment, I would have expected a higher correlation between the two. Obviously a loan’s term and original amount are much bigger factors in the calculation of a monthly payment. The plot above has some really interesting discreteness that shows what we would expect, which is lines representing common original loan amounts moving in the plot as the interest rate affects the ultimate monthly payment. The most interesting thing to me however is how the pitch and variability of these lines change with each Prosper Credit Grade group and as the loan amount increases. First, there is generally much less variability in potential interest rates among higher graded borrowers. The lowest graded group has interest rates ranging from 5% to over 40%, though they are much less likely to be approved for higher loan amounts. Next, in each Prosper Credit Grade facet, we see that the pitch of the discrete lines sharpens as the original loan amount increases. This is because increases in the interest rate will cause larger increases in monthly payments for larger loans. This is not surprising, but still interesting to see plotted.

There are a few interesting things to look at in this plot. First, we see visually the difference in average credit score assigned to each Prosper Credit Grade under the old and newer rating systems. Given what we know about the differences in scores issued under ScoreX and FICO08, I would have expected to see a much bigger difference between the two under the same Prosper Grading system. Next while it does not seem that debt to income ratio was factored heavily into credit under ScoreX, we see some odd trends in the FICO08 data. The credit scores of the top three highest graded groups seem to increase with a modest amount of credit utilization, however drop sharply above 50% debt to income ratio. The scores of lower graded borrowers (“E” and “HR”) however, see more consistent improvements to credit score with increases in credit utilization, which is counter-intuitive.

I wonder if we will see different results when splitting this by loan amount.

Given that investors make money on the interest of loans, I would have expected to see more investors per loan as interest rates increased. That we essentially see the opposite on loans with an interest rate above 10% is indicative of the fact that higher risk borrowers receive higher interest loans, which make them less desirable to investors.

In all instances except the lowest credit score range, the new credit score source meant lower mean interest rates and less variability within each score group. Given we know that the scores from each of these systems are calculated very differently and have a different range of possibilities, I believe that the variance we see is due to different populations occupying the score ranges, rather than the new system being used by Prosper to issue lower interest rates for the same credit score range.

Correlation Coefficient of avgCredit and LoanOriginalAmount: 0.3522
A borrowers credit score obviously does not dictate the size of the loan they receive, but it does appear that borrowers with higher credit scores are afforded the opportunity to receive a wider range of loan amounts than borrowers with lower scores. For example, the median loan for borrowers with a credit score lower than 630 was less than $4,000. Whereas borrowers with higher credit scores received median loans as high as $15,000. The variance around this median generally increases with credit score.

It is interesting to see the change in variability of interest rates and credit scores among each credit rating group. Whereas the group rated “AA” generally all have high credit scores and low interest rates, “C” and “D” rated borrowers have much more variability in both metrics. “HR” rated borrowers are generally relegated to the highest interest rates, even though some in this group have credit scores similar to those in the “AA” group.

Regardless of credit score, a borrower’s interest rate typically increases as their Prosper Credit Grade decreases.

While loans with borrowers that have higher Debt to Income Ratios generally attract fewer investors per loan, oddly, Non-Debt Consolidation loans do not suffer this decline as drastically. As we have seen, those borrowers securing debt consolidation loans have a higher debt to income ratio at the time of application. One would assume that a high debt to income ratio would not affect investor behavior on these loans as much as with other loans, yet it does. One possible reason for this decline is that investors may actually feel more confident in the Debt Consolidation Loans in general and thus are comfortable investing a higher percentage of the original loan amount.

I wanted to find out how differently each credit score provider factored delinquencies into the score calculation. We previously found that the biggest difference in credit score with regard to delinquencies was between borrowers with zero or one delinquency and everyone else. We see this again here, but interestingly, it appears that scores under the FICO08 system take a much bigger hit with 2 or more delinquencies. With this system, not only are the highest scores reserved for those with little to no delinquencies, those with 10 or more in the last seven years have credit scores that are effectively maxed out between 700 and 725 and much more likely to be in the 600’s. On the flip side, borrowers with more than 30 delinquencies under the ScoreX system have scores ranging from 550 to over 800!

Correlation Coefficient of avgCredit and OpenRevolvingMonthlyPayment: 0.1373
I initially wanted to see if open credit monthly payments negatively affected credit score, but quickly realized that someone making $1000/month would be much more affected by a $500 monthly credit card bill than would someone making $15,000/month. For the plot above, I split the data into the top half and bottom half of earners by stated monthly income. I expected to see a significant decline in average credit score among the bottom half of earners as the actual dollar amount of monthly revolving payment went up. Oddly, it stayed relatively flat and even matched the trend of the top half earners. I suspect that this is not something factored into score as much as it is factored into loan eligibility and underwriting.

When looking at the plot, 4 facets stand out to me as different than the rest. #4 (Personal Loan), #8 (Baby & Adoption), #10 (Cosmetic Procedure), #11 (Engagement Ring) are the four that I think call out for further investigation. While the comedian in me wants to show the variability or positive relationship in these facets as consistent with other poor life choices, as shown by their type of loan applied for, I believe the actual answer is much simpler and less funny. In total, these four loan categories account for just over 2.5% of the dataset; most of which is the personal loans. With more examples, I would expect to see these plots exhibit regression to the mean as with the other plots.

I was curious to see if, on average, the original loan amount increases when stated monthly income increases. The thought is that those with more money either need more money to maintain a lifestyle or are eligible to receive (and more importantly pay back) more money. Faceting by loan type seems to have helped highlight this as the biggest differences in loan amount between lower income earners and higher earners were the following loan categories: Debt Consolidation, Home Improvement, Business, Large Purchase, and Wedding. With all of these categories, loans are likely higher among top income earners in order to maintain a lifestyle (Debt Consolidation, Home Improvement, Large Purchase, Wedding) or because the borrower is more likely to produce a return on investment (Debt Consolidation, Home Improvement, Business).

Other than loans that did not report a category, Student Loans were the only group where the borrower’s debt to income ratio went up as their borrowed loan amount goes up.

As loan type would not affect the relationship between two credit variables that were recorded before the loan was processed, each of these looks very much the same.

The number of delinquencies in a borrower’s credit history seems to affect credit score in a similar fashion among a given score provider’s data, regardless of whether the borrower was a homeowner. It seems as though all debt is the same when it comes to this part of the credit score calculation.

Delinquency Summary of Borrowers with the Old Rating System :
Delinquency Summary of Borrowers with the New Rating System:
The new Prosper Rating System is much less forgiving on number of delinquencies in the borrower’s recent history. Not only are the medians equal or lower across the board, the third quartile numbers among the lower rated borrowers are much lower with the new system. One example that highlights this change is, looking only at the comparison of these variables, “C” rated borrowers in the old system appear to share compositional characteristics with “E” rated borrowers with the new system. I found MANY articles that suggested Prosper has struggled with repayment of it’s investors, so I wonder if the new rating system weighed delinquency more heavily to combat this.

Higher rates of delinquencies don’t necessarily mean lower investors per loan for a given Prosper Credit Grade, but each subset does see greater variability in investors per loan as the borrower’s delinquencies go up.
Investor Summary:
Investor Summary of “ScoreX” Loans:
Investor Summary of “FICO08” Loans:


Prosper Grade Linear Regression with “ScoreX” Loans:
| (Intercept) |
22.120*** (0.131) |
22.190*** (0.131) |
22.521*** (0.132) |
| log(avgCredit) |
-3.193*** (0.020) |
-3.207*** (0.020) |
-3.266*** (0.020) |
| AvailableBankcardCredit |
-0.000*** (0.000) |
-0.000*** (0.000) |
-0.000*** (0.000) |
| BankcardUtilization |
0.097*** (0.005) |
0.094*** (0.005) |
0.100*** (0.005) |
| InquiriesLast6Months |
0.006*** (0.001) |
0.006*** (0.001) |
0.006*** (0.001) |
| DelinquenciesLast7Years |
0.002*** (0.000) |
0.002*** (0.000) |
0.002*** (0.000) |
| OpenCreditLines |
0.000 (0.000) |
-0.002*** (0.000) |
-0.001* (0.000) |
| as.numeric(FirstRecordedCreditLine) |
0.000*** (0.000) |
0.000*** (0.000) |
0.000*** (0.000) |
| OpenRevolvingMonthlyPayment |
0.000*** (0.000) |
0.000*** (0.000) |
0.000*** (0.000) |
| RevolvingCreditBalance |
-0.000*** (0.000) |
-0.000*** (0.000) |
-0.000*** (0.000) |
| TotalCreditLinespast7years |
|
0.001*** (0.000) |
0.002*** (0.000) |
| DebtToIncomeRatio |
|
|
0.026*** (0.002) |
| R-squared |
0.475 |
0.476 |
0.509 |
| adj. R-squared |
0.475 |
0.476 |
0.509 |
| sigma |
0.390 |
0.390 |
0.378 |
| F |
7784.336 |
7027.846 |
6692.438 |
| p |
0.000 |
0.000 |
0.000 |
| Log-likelihood |
-37014.548 |
-36956.756 |
-31588.568 |
| Deviance |
11793.671 |
11776.074 |
10119.826 |
| AIC |
74051.097 |
73937.511 |
63203.136 |
| BIC |
74152.922 |
74048.593 |
63322.337 |
| N |
77409 |
77409 |
70919 |

Old Propser Rating Linear Regression:
| (Intercept) |
18.421*** (0.018) |
19.525*** (0.022) |
19.560*** (0.025) |
| avgCredit |
-0.022*** (0.000) |
-0.024*** (0.000) |
-0.024*** (0.000) |
| AvailableBankcardCredit |
|
0.000*** (0.000) |
0.000*** (0.000) |
| BankcardUtilization |
|
|
-0.016* (0.006) |
| R-squared |
0.958 |
0.967 |
0.967 |
| adj. R-squared |
0.958 |
0.967 |
0.967 |
| sigma |
0.379 |
0.324 |
0.324 |
| F |
645122.395 |
311164.891 |
207589.217 |
| p |
0.000 |
0.000 |
0.000 |
| Log-likelihood |
-12667.381 |
-6256.871 |
-6201.343 |
| Deviance |
4055.038 |
2248.860 |
2234.677 |
| AIC |
25340.762 |
12521.742 |
12412.687 |
| BIC |
25365.507 |
12553.628 |
12452.531 |
| N |
28233 |
21408 |
21350 |

As we would expect and have show previously, the distribution of credit grades is somewhat normal, with “C” rated borrowers being the most frequent. Also, with each credit grade, borrowers with lower score comprise a larger percentage of the group and those with higher scores comprise less. None of this is surprising given our previous findings, but it is interesting to see the composition visually.

Throughout the analysis, I became intrigued by the question of how Investor behavior is affected by various factors, events of variables. The plot above shows what proportion of total investors per month was allocated to each Prosper Credit Grade. In other words, which letter grades draw the attention (and funds) of the investors and how has this changes over time? Surprisingly, we see major changes throughout the data, where loans become more or less popular. If I were to push this further, I might look at major changes happened on the site to see if these coincide with the shifts we see. This would include changes to the rating systems or any legal action, which is what I suspect caused the huge gap around month 50.
Using a linear model to explain the variation in Prosper Credit Grade

I subset the data to include only loans with the newest rating system and credit scores from FICO08, so I can try to come up with a formula that would produce an estimate for Prosper Credit Grade during this era. Applying a log transformation to Prosper Credit Grade and avgCredit produced a plot that appeared to be the most linear, which will hopefully give us a decent base on which to create a linear model. The plot below will help determine other variables that will help refine the model.

| (Intercept) |
33.222*** (0.311) |
33.199*** (0.327) |
32.977*** (0.328) |
| log(avgCredit) |
-4.951*** (0.047) |
-4.948*** (0.050) |
-4.921*** (0.050) |
| AvailableBankcardCredit |
-0.000*** (0.000) |
-0.000*** (0.000) |
-0.000*** (0.000) |
| BankcardUtilization |
0.087*** (0.009) |
0.087*** (0.009) |
0.097*** (0.009) |
| DebtToIncomeRatio |
1.086*** (0.016) |
1.086*** (0.016) |
1.089*** (0.016) |
| InquiriesLast6Months |
0.088*** (0.002) |
0.088*** (0.002) |
0.088*** (0.002) |
| DelinquenciesLast7Years |
|
0.000 (0.000) |
0.000 (0.000) |
| as.numeric(FirstRecordedCreditLine) |
|
|
0.000*** (0.000) |
| R-squared |
0.514 |
0.514 |
0.514 |
| adj. R-squared |
0.514 |
0.513 |
0.514 |
| sigma |
0.329 |
0.329 |
0.329 |
| F |
5671.893 |
4726.420 |
4063.001 |
| p |
0.000 |
0.000 |
0.000 |
| Log-likelihood |
-8255.884 |
-8255.858 |
-8235.549 |
| Deviance |
2908.254 |
2908.249 |
2903.855 |
| AIC |
16525.769 |
16527.717 |
16489.098 |
| BIC |
16583.159 |
16593.305 |
16562.884 |
| N |
26864 |
26864 |
26864 |
Multivariate Analysis
Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?
During the span of time covered by the dataset, Prosper changed how it’s proprietary grading system was calculated as well as it’s provider of credit scores. After both changes, large changes in investor behavior could be observed. The changes to it’s grading system and credit score provider likely coincided with or even inspired a boom in investor confidence. While the number of individual investments per day went up significantly after the change in providers, from 2879 per day to 4972, the median investors per loan went from 62 during the ScoreX era to 1 after FICO08 was introduced. Also, with the new Prosper rating system, interest rates per credit grade appear to vary much less than with the old system, indicating that common rates are likely assigned to particular grades and/or the requirements for each grade had been strengthened. Blog posts around this time indicate that investors with large amounts of capital were flooding to the site and fully funding loans more often than before, which is consistent with what we see in the data.
While the lowest interest rate loans were the most likely to have a high investor count and investors decline as interest rates go up, almost all loan types have increased investors at the higher end of their possible interest rates. I suspect that as interest rates go up, the borrowers generally present more risk to the investors, leading to less investors being interested in funding the loans. However, at the higher end of the spectrum, investors likely cant resist the potential return from the favorable rates.
In all instances except the lowest credit score range, the new credit score source meant lower mean interest rates and less variability within each score group. Given we know that the scores from each of these systems are calculated very differently and have a different range of possibilities, I believe that the variance we see is due to different populations occupying the score ranges, rather than the new system being used by Prosper to issue lower interest rates for the same credit score range.
While loans with borrowers that have higher Debt to Income Ratios generally garner fewer investors per loan, oddly, Non-Debt Consolidation loans do not suffer this decline as drastically. As we have seen, those borrowers securing debt consolidation loans have a higher debt to income ratio at the time of application. One would assume that a high debt to income ratio would not affect investor behavior on these loans as much as with other loans, yet it does. One possible reason for this decline is that investors may actually feel more confident in the Debt Consolidation Loans in general and thus are comfortable investing a higher percentage of the original loan amount. Other than loans that did not report a category, Student Loans were the only group where the borrower’s debt to income ratio went up as their borrowed loan amount goes up.
When comparing monthly payment to original loan amount, we see a fairly linear relationship, with the majority of the variation due to APR. When split by term, we see that loans with a 12 month term had the greatest variance in APR, but the 36 month loans had the highest mean and the most extreme outliers, which are also indications of variability. This is verified by the plots.
Were there any interesting or surprising interactions between features?
It was fascinating to see the comparison between debt to income ratio and average credit score. Among most of the Prosper Grade groups, higher debt to income ratio actually lead to a higher average credit score, which is counter intuitive. The credit score provider and Prosper Rating System also had an effect on the relationship, with each combination producing very different results. The highest Prosper graded borrowers were the only group to consistently see a decline in credit score with increased debt to income ratio, which begs the question: Is a little bit of debt actually good for your credit?
The relationship between Open Credit Lines and Debt to Income was much less lineal or dramatic than I expected as well. In fact homeowners in the dataset with 40 or more open lines of credit actually have a lower debt to income ratio on average than those with only 35 open lines of credit. Non-Homeowners with more than 35 open lines of credit generally have a higher debt to income ratio than homeowners, but there is not a significant amount of difference between the two subsets among those with less than 35 open lines of credit.
Next, when comparing Delinquencies to Credit Score, it was interesting to see some borrowers that were assigned grades of ‘AA’ or ‘A’ that had much lower credit scores or higher numbers of delinquencies during the last 7 years, yet retained their relatively high grade. Similarly, there were some borrowers with the low grade of ‘E’, yet credit scores of nearly 750 and very few delinquencies. This was much less common after the switch to FICO08 credit scores, however. With the switch to the new rating system and credit score provider, it would appear that standards for each grade were strengthened. One good example of this is seen when looking at delinquencies per Prosper Grade and realizing that borrowers with higher delinquencies were much less likely to be approved for loans on Prosper after the switch.
Finally, it was interesting to see that regardless of the length of the term, those borrowers of loans from $3500-6000 generally received the highest APR. Bigger and smaller loans generally lead to lower APRs, whereas I would have expected this to be more linear.
OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.
After noticing a seeming linear relationship between credit scores and Prosper credit grades under the old rating system, I created a linear model using each of these variables and two related to bank card utilization. I found that credit score and bank card utilization accounted for 96.7% of the variance in Prosper Credit Grades.
However, this felt like cheating as the old rating system was extremely simplified, so I set out to create a model that would predict the Prosper credit grade with the new rating system as well as the new credit score provider. Unfortunately, even with the inclusion of a number of personal credit variables, the model was only able to account for approximately 51.4% of the variance in the grade. I suspect that the new rating system might have taken into account past activity on Prosper, the variables for which, I did not include in my subset of the data.